Locking in raid storage systems

ABSTRACT

A method for regulating I/O requests in a RAID storage system may comprise: receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume; receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume; and queuing the second request. 
     A system for regulating I/O requests in a RAID storage system may comprise: means for receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume; means for receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume; and means for queuing the second request.

BACKGROUND

A redundant array of inexpensive disks (RAID) storage subsystem may beresponsible for management, processing and storage for input/output(I/O) requests from one or more hosts attached to the subsystem. Whileprocessing multiple requests simultaneously, it is desirable that thestorage subsystem maintains integrity of data while processing hostrequests in a reasonable amount of time. One approach to ensuring dataintegrity is locking.

For example, in order to ensure that multiple host I/O requests do notconflict, locking solutions may lock (e.g. make the RAID volumeinaccessible to more than one host) an entire logical RAID volume whileone I/O request is being processed by a RAID controller. In anotherapproach, locking may be on RAID stripe basis where an entire stripe islocked while an I/O request is being processed by a RAID controller.Locking may ensure that a host I/O request that accesses or updates datamaintained on the RAID volume is completed without compromising on theintegrity of data involved.

Further, RAID arrays involve an implicit parity generation for writesinitiated by a host. Such parity generation operations requireassociated parity reads/writes across a given RAID stripe for which awrite operation has occurred.

SUMMARY

Methods and systems for controlling access to a RAID storage system arepresented.

A method for regulating I/O requests in a RAID storage system maycomprise: receiving a first request to access a first set of one or morelogical block addresses (LBAs) of a RAID volume; receiving a secondrequest to access at least one of the first set of one or more LBAs ofthe RAID volume; and queuing the second request.

A system for regulating I/O requests in a RAID storage system maycomprise: means for receiving a first request to access a first set ofone or more logical block addresses (LBAs) of a RAID volume; means forreceiving a second request to access at least one of the first set ofone or more LBAs of the RAID volume; and means for queuing the secondrequest.

It may be to be understood that both the foregoing general descriptionand the following detailed description may be exemplary and explanatoryonly and may be not necessarily restrictive of the claims. Theaccompanying drawings, which may be incorporated in and constitute apart of the specification, illustrate examples and together with thegeneral description, serve to explain the principles of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The numerous advantages of the disclosure may be better understood bythose skilled in the art by reference to the accompanying figures inwhich:

FIG. 1 shows a high-level system diagram for a RAID.

FIG. 2 shows a high-level system diagram for a RAID.

FIG. 3 shows a high-level system diagram for a RAID.

FIG. 4 shows a high-level system diagram for a RAID.

FIG. 5 shows a high-level system diagram for a RAID

FIG. 6 shows a high-level system diagram for a RAID.

DETAILED DESCRIPTION

In the following detailed description, reference may be made to theaccompanying drawings, which form a part hereof. In the drawings,similar symbols typically identify similar components, unless contextdictates otherwise. The illustrative embodiments described in thedetailed description, drawings, and claims are not meant to be limiting.Other embodiments may be utilized, and other changes may be made,without departing from the spirit or scope of the subject matterpresented here.

FIG. 1 illustrates a RAID storage system 100. The RAID storage system100 may include a RAID volume 101, a RAID controller 102, and one ormore host computing devices 103. The RAID volume 101 may include n+1physical drives (e.g. Drive 0-Drive n).

Referring to FIG. 2, the physical drives may be partitioned into one ormore data strips (e.g. D0 ₀, D1 ₀, D2 ₀ . . . Dn-1 ₀; D0 ₁, D1 ₁, D2 ₁ .. . Dn₁). A data strip may include a collection of physical logicalblock addresses (LBAs) on a given physical drive.

The RAID volume 101 may further include one or more parity strips (e.g.P₀, P₁). The data value maintained in a given parity strip may be alogical XOR of all data strips on that stripe (e.g. P0=D0 XOR D1 XOR D2XOR D3 XOR D4 XOR D5 XOR D6). Stripe 0 and stripe 1 represent the firsttwo stripes of the logical volume. Stripes are in effect the rows.

The RAID volume 101 may include one or more stripes (e.g. Stripe 1,Stripe 2). A data stripe may include one or more data strips along withone or more parity strips.

During normal operation of the RAID storage system 100, a write requestto any data strip on a given stripe may trigger an update of the data inthe addressed strip as well as an update of the one or more paritystrips associated with the one or more stripes which include theaddressed data strip.

In a multi-host configuration, it may be the case that multiple hosts(e.g. host computing devices 103) attempt to access a given data stripin a substantially simultaneous manner. However, as referenced above, itorder to maintain data integrity of the RAID volume 101, this access mayonly be granted in a manner such that only one of the host computingdevices 103 may modify a given data or parity strip at a time.

As such, upon receipt of an I/O request from a given host computingdevice 103 to access (e.g. read and/or write operations) one or moredata strips, various portions of the RAID volume 101 may be locked so asto restrict access by any other host computing device (or subsequent I/Orequest by the original requesting host computing device) to thoseportions of the RAID volume 101.

Referring to FIG. 3, an LBA-level logical locking (L-Lock) methodologyis illustrated. The RAID volume 101 may receive an I/O request from oneor more host computing devices 103 (e.g. host computing devices 103)directed to one more data strips. For example, as shown in FIG. 3, theRAID volume 101 may receive an I/O request from one or more hostcomputing devices 103 directed to data strip D0 ₀ and a portion of datastrip D0 ₁.

The shaded regions of data strip D0 ₀ and data strip D1 ₀ denote LBAsaddressed by the I/O request from the host computing device 103A. L-Lock1 may be applied to only the LBAs addressed by the I/O request and allother LBAs on the logical volume may remain free to be accessed by anyother I/O request (e.g. subsequent requests to modify (e.g. write to)the data maintained in the LBAs associated with L-Lock 1 will be queueduntil processing of the first request is completed). Further, the shadedregion of data strip D2 ₀ denotes LBAs addressed by a second I/O requesteither from the same host computing device 103A or a second hostcomputing device 103B. L-Lock 2 may be applied to only the LBAsaddressed by the I/O request and all other LBAs on the logical volumemay remain free to be accessed by any other I/O request (e.g. subsequentrequests to modify (e.g. write to) the data maintained in the LBAsassociated with L-Lock 2 will be queued until processing of the firstrequest is completed)

When multiple I/O requests are received which do not requiremodification of a parity strip (e.g. operations which do not modify thecontents of the addressed LBAs, such as read operations, write-backwrite operations, and the like) such operations may be processed in asubstantially simultaneous manner (e.g. at least part of the processinginterval associated with a first operation may overlap temporally withat least part of the processing interval of a second operation). Forexample, as shown in FIG. 4, a first I/O request may be addressed toLBAs of data strip D0 _(0,) data strip D1 ₀ and data strip D2 ₀resulting in L-Lock 1. A second I/O request may be addressed to LBAs ofto data strip D2 ₀ resulting in L-Lock 2. If the I/O request addressedto LBAs of data strip D0 _(0,) data strip D10 and data strip D2 ₀ is aread operation and the I/O request addressed to the data strip D2 ₀ is awrite-back write operation to a write-back cache, both I/O requests maybe processed at least partially simultaneously. Alternately, if the I/Orequest addressed to LBAs of data strip D0 _(0,) data strip D10 and datastrip D2 ₀ is a write operation and the I/O request addressed to thedata strip D2 ₀ is a read operation, the requests will be processed inthe order in which they are received with the later in time requestbeing queued until completion of the first request.

However, L-Locks may not be granted on a region until a physical lock(P-Lock) is active on that region or a part of that region, as will bediscussed further below. The RAID controller 102 may maintain stateinformation regarding the various types of locks (L-Lock or P-Lock)applied to various LBAs by each I/O request that is being processed bythe RAID volume 101 and conflicting requests may be queued.

Referring to FIG. 5, a P-Lock methodology is illustrated. A P-Lockmethodology may include locking based on the physical layout of LBAs(e.g. a layout mapped on the RAID subsystem's internal view of the LBAswhich is the physical distribution on the disks). Certain requests bythe host computing devices 103 or various operations internal to theRAID volume 101 (e.g. consistency checks, rebuild/reconstructionsoperations, etc.) that require access to a range of LBAs spanningnumerous physical drives, thereby necessitating the update of paritystrips P0 and/or P1, may initiate the locking on an entire stripe (e.g.Stripe 1 or Stripe 2) so as to maintain data integrity.

The shaded region of data strip D0 ₀ and data strip D1 ₀ denotes theLBAs addressed by the I/O request from a host computing device 103.P-Locks may be applied according to the physical view of the drives asseen internally by the RAID controller 102. The RAID controller 102 maylock an entire stripe for the range of physical LBAs in the I/O request.For example, when an I/O request is received accessing data strip D0 ₀and a portion of data strip D1 ₀, a P-Lock 1 may be applied to datastrips D0 ₀ to Dn-1 ₀ and P0 of Stripe 0 in order to maintain dataintegrity. As such, any I/O request attempting to access blocks D0 ₀ toDn-1 ₀ may be required to wait in queue for the P-Lock 1 to be released.Any other range of LBAs that lies outside of this P-Locked region (e.g.Stripe 2) may continue to be accessed in parallel by the host computingdevices 103. P-Locks may be applied to write-through logical volumes,degraded RAID5/RAID6 logical volumes, partially degraded RAID6 logicalvolumes and any internal operations (e.g. recoveries, consistency check,rebuild, reconstruction, patrol read, etc). I/O requests addressingmultiple data strips spanning across the aforementioned cases mayrequire a P-Lock before that I/O request can be serviced. P-Locks maynot be granted for a region if any other lock is currently active(either L or P) on that region or a part of that region.

Any lock that has been acquired must be released once the I/O processingis complete for the I/O request that initiates a lock. Whenever a lockis released, the RAID controller 102 may check the queue of pending I/Orequests (from a host or internal to the RAID controller 102) on theregion or part of region currently being released from the lock. If anI/O request is pending for the region which has been released, RAIDcontroller 102 may grant a lock to the pending request only if there areno conflicting locks already in place.

Referring again to FIG. 5, P-Lock 1 and P-Lock 2 may lock Stripe 1 andStripe 2, respectively. In the case where an I/O request addressing oneor more data strips of Stripe 1 and Stripe 2 is waiting in queue, bothP-Lock 1 and P-Lock 2 must be released prior to processing the I/Orequest. Should P-Lock 2 be released, the I/O waiting in queue may notbe processed if the remaining lock P-Lock 1 maintains some degree ofoverlap in I/O range with P-Lock 2. Only when P-Lock 1 is released maythe I/O request waiting in queue be granted access.

It may also be the case where a host read to on or more data strips mustbe recovered (e.g. recovery from a media error). In general a readoperation may require the application of an L-Lock to the specific LBAsof data strip portions addressed by the host read request. However, upondetection of a media error resulting in the failure of the read request,in order to service the read request, the media error LBA may need to berecovered through use of the parity data maintained on a parity strip(e.g. P₀). In such a case, an L-Lock may be promoted to a P-Lock.

Referring to FIG. 6, when an I/O request directed to data strip D0 ₀ anddata strip D1 ₀ as depicted in shaded region of Stripe 1, an L-Lock 1 tothe addressed LBAs may be established (e.g. State A). While the I/Orequest is being serviced, a media error may be detected on Drive 1 orDrive 2 which disrupts the I/O request. Such a failure may necessitatethe recovery of the data on the media error LBA via common RAID recoveryprocesses. As such recovery processes may require data from other driveswell (i.e. strips D2 ₀ . . . Dn-1 ₀, P₀) the L-Lock 1 previouslyestablished for to data strip D0 ₀ and data strip D1 ₀ may be promotedto a P-Lock 1 across the entirety of Stripe 1 (e.g. State B), so as tomaintain the consistency of the data on all drives during the recovery.Once the L-Lock 1 lock is promoted to P-Lock 1 no other I/O request mayupdate any physical LBA on Strip 1. Upon completion of the recoveryoperation, the P-Lock 1 State B may revert back to L-Lock 1 of State A.

The foregoing detailed description may include set forth variousembodiments of the devices and/or processes via the use of blockdiagrams, flowcharts, and/or examples. Insofar as such block diagrams,flowcharts, and/or examples contain one or more functions and/oroperations, it will be understood by those within the art that eachfunction and/or operation within such block diagrams, flowcharts, orexamples may be implemented, individually and/or collectively, by a widerange of hardware, software, firmware, or virtually any combinationthereof. In one embodiment, several portions of the subject matterdescribed herein may be implemented via Application Specific IntegratedCircuits (ASICs), Field Programmable Gate Arrays (FPGAs), digital signalprocessors (DSPs), or other integrated formats. However, those skilledin the art will recognize that some aspects of the embodiments disclosedherein, in whole or in part, may be equivalently implemented inintegrated circuits, as one or more computer programs running on one ormore computers (e.g., as one or more programs running on one or morecomputer systems), as one or more programs running on one or moreprocessors (e.g., as one or more programs running on one or moremicroprocessors), as firmware, or as virtually any combination thereof,and that designing the circuitry and/or writing the code for thesoftware and or firmware would be well within the skill of one of skillin the art in light of this disclosure.

In addition, those skilled in the art will appreciate that themechanisms of the subject matter described herein may be capable ofbeing distributed as a program product in a variety of forms, and thatan illustrative embodiment of the subject matter described hereinapplies regardless of the particular type of signal bearing medium usedto actually carry out the distribution. Examples of a signal bearingmedium include, but may be not limited to, the following: a recordabletype medium such as a floppy disk, a hard disk drive, a Compact Disc(CD), a Digital Video Disk (DVD), a digital tape, a computer memory,etc.; and a transmission type medium such as a digital and/or an analogcommunication medium (e.g., a fiber optic cable, a waveguide, a wiredcommunications link, a wireless communication link (e.g., transmitter,receiver, transmission logic, reception logic, etc.), etc.).

Those having skill in the art will recognize that the state of the artmay include progressed to the point where there may be littledistinction left between hardware, software, and/or firmwareimplementations of aspects of systems; the use of hardware, software,and/or firmware may be generally (but not always, in that in certaincontexts the choice between hardware and software may becomesignificant) a design choice representing cost vs. efficiency tradeoffs.Those having skill in the art will appreciate that there may be variousvehicles by which processes and/or systems and/or other technologiesdescribed herein may be effected (e.g., hardware, software, and/orfirmware), and that the preferred vehicle will vary with the context inwhich the processes and/or systems and/or other technologies may bedeployed. For example, if an implementer determines that speed andaccuracy may be paramount, the implementer may opt for a mainly hardwareand/or firmware vehicle; alternatively, if flexibility may be paramount,the implementer may opt for a mainly software implementation; or, yetagain alternatively, the implementer may opt for some combination ofhardware, software, and/or firmware. Hence, there may be severalpossible vehicles by which the processes and/or devices and/or othertechnologies described herein may be effected, none of which may beinherently superior to the other in that any vehicle to be utilized maybe a choice dependent upon the context in which the vehicle will bedeployed and the specific concerns (e.g., speed, flexibility, orpredictability) of the implementer, any of which may vary. Those skilledin the art will recognize that optical aspects of implementations willtypically employ optically-oriented hardware, software, and or firmware.

It is believed that the present invention and many of its attendantadvantages will be understood by the foregoing description. It may bealso believed that it will be apparent that various changes may be madein the form, construction and arrangement of the components thereofwithout departing from the scope and spirit of the invention or withoutsacrificing all of its material advantages. The form herein beforedescribed being merely an explanatory embodiment thereof. It may be theintention of the following claims to encompass and include such changes.

1. A method for regulating I/O requests in a redundant array of inexpensive discs (RAID) volume, the method comprising: receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume; receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume; and queuing the second request.
 2. The method of claim 1, further comprising: processing the first request to access the first set of one or more LBAs; and processing the second request to access at least one of the first set of one or more LBAs of the RAID volume.
 3. The method of claim 1, further comprising: receiving a third request to access a second set of one or more LBAs which are not included in the first set of one or more LBAs of the RAID volume; and processing the first request and third request.
 4. The method of claim 1, wherein the receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume comprises: receiving a first request to access a first set of one or more LBAs of a RAID volume, the first request not requiring access to a parity strip.
 5. The method of claim 4, further comprising: receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume, the second request not requiring access to a parity strip; and processing the first request and the second request, at least partially simultaneously.
 6. The method of claim 1, further comprising: receiving a request requiring access to a parity strip; receiving a request to access one or more LBAs associated with a stripe including the parity strip; and queuing the request to access the one or more LBAs associated with the stripe including the parity strip.
 7. The method of claim 6, wherein the receiving a request requiring access to a parity strip further comprises: receiving a request selected from the group comprising: a request to recover one or more data strips associated with the stripe including the parity strip; a request to reconstruct the stripe including the parity strip; a request to complete a consistency check on one or more data strips associated with the stripe including the parity strip; and a request to carry out a patrol read on one or more data strips associated with the stripe including the parity strip data.
 8. The method of claim 6, wherein the receiving a request requiring access to a parity strip further comprises: receiving a first request to access a first parity strip and a second parity strip.
 9. The method of claim 8, further comprising receiving a second request to access the first parity strip; receiving a request to access one or more LBAs associated with the stripe including the first parity strip; queuing the request to access the one or more LBAs associated with the stripe including the first parity strip; processing the first request to access the first parity strip and the second parity strip; processing the second request to access the first parity strip; and processing the request to access the one or more LBAs associated with the stripe including the first parity strip.
 10. The method of claim 1, further comprising: detecting a media error in processing the first request to access the first set of one or more LBAs of a stripe of the RAID; receiving a request to access one or more LBAs of the stripe of the RAID volume that are not in the first set of one or more LBAs; queuing one or more requests to access one or more LBAs of the stripe of the RAID volume; recovering one or more LBAs of the first set of one or more LBAs; and processing the request to access one or more LBAs of the stripe of the RAID volume that are not in the first set of one or more LBAs.
 11. A system for regulating I/O requests in a redundant array of inexpensive discs (RAID) volume, the system comprising: means for receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume; means for receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume; and means for queuing the second request.
 12. The system of claim 11, further comprising: means for processing the first request to access the first set of one or more LBAs; and means for processing the second request to access at least one of the first set of one or more LBAs of the RAID volume.
 13. The system of claim 11, further comprising: means for receiving a third request to access a second set of one or more LBAs which are not included in the first set of one or more LBAs of the RAID volume; and means for processing the first request and third request.
 14. The system of claim 11, wherein the means for receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume comprises: means for receiving a first request to access a first set of one or more LBAs of a RAID volume, the first request not requiring access to a parity strip.
 15. The system of claim 14, further comprising: means for receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume, the second request not requiring access to a parity strip; and means for processing the first request and the second request, at least partially simultaneously.
 16. The system of claim 11, further comprising: means for receiving a request requiring access to a parity strip; means for receiving a request to access one or more LBAs associated with a stripe including the parity strip; and means for queuing the request to access the one or more LBAs associated with the stripe including the parity strip.
 17. The system of claim 16, wherein the means for receiving a request to access one or more LBAs associated with a stripe including the parity strip further comprises: means for receiving a request selected from the group comprising: a request to recover one or more data strips associated with the stripe including the parity strip; a request to reconstruct the stripe including the parity strip; a request to complete a consistency check on one or more data strips associated with the stripe including the parity strip; and a request to carry out a patrol read on one or more data strips associated with the stripe including the parity strip data.
 18. The system of claim 16, wherein the means for receiving a request to access one or more LBAs associated with a stripe including the parity strip further comprises: means for receiving a first request to access a first parity strip and a second parity strip.
 19. The system of claim 18, further comprising means for receiving a second request to access the first parity strip; means for receiving a request to access one or more LBAs associated with the stripe including the first parity strip; means for queuing the request to access the one or more LBAs associated with the stripe including the first parity strip; means for processing the first request to access the first parity strip and the second parity strip; means for processing the second request to access the first parity strip; and means for processing the request to access the one or more LBAs associated with the stripe including the first parity strip.
 20. The system of claim 11, further comprising: means for detecting a media error in processing the first request to access the first set of one or more LBAs of a stripe of the RAID; means for receiving a request to access one or more LBAs of the stripe of the RAID volume that are not in the first set of one or more LBAs; means for queuing one or more requests to access one or more LBAs of the stripe of the RAID volume; means for recovering one or more LBAs of the first set of one or more LBAs; and means for processing the request to access one or more LBAs of the stripe of the RAID volume that are not in the first set of one or more LBAs.
 21. A system for regulating I/O requests in a redundant array of inexpensive discs (RAID) volume, the system comprising: circuitry for receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume; circuitry for receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume; and circuitry for queuing the second request.
 22. The system of claim 21, further comprising: circuitry for processing the first request to access the first set of one or more LBAs; and circuitry for processing the second request to access at least one of the first set of one or more LBAs of the RAID volume.
 23. The system of claim 21, further comprising: circuitry for receiving a third request to access a second set of one or more LBAs which are not included in the first set of one or more LBAs of the RAID volume; and circuitry for processing the first request and third request.
 24. The system of claim 21, wherein the circuitry for receiving a first request to access a first set of one or more logical block addresses (LBAs) of a RAID volume comprises: circuitry for receiving a first request to access a first set of one or more LBAs of a RAID volume, the first request not requiring access to a parity strip.
 25. The system of claim 24, further comprising: circuitry for receiving a second request to access at least one of the first set of one or more LBAs of the RAID volume, the second request not requiring access to a parity strip; and circuitry for processing the first request and the second request, at least partially simultaneously.
 26. The system of claim 21, further comprising: circuitry for receiving a request requiring access to a parity strip; circuitry for receiving a request to access one or more LBAs associated with a stripe including the parity strip; and circuitry for queuing the request to access the one or more LBAs associated with the stripe including the parity strip.
 27. The system of claim 26, wherein the circuitry for receiving a request to access one or more LBAs associated with a stripe including the parity strip further comprises: circuitry for receiving a request selected from the group comprising: a request to recover one or more data strips associated with the stripe including the parity strip; a request to reconstruct the stripe including the parity strip; a request to complete a consistency check on one or more data strips associated with the stripe including the parity strip; and a request to carry out a patrol read on one or more data strips associated with the stripe including the parity strip data.
 28. The system of claim 26, wherein the circuitry for receiving a request to access one or more LBAs associated with a stripe including the parity strip further comprises: circuitry for receiving a first request to access a first parity strip and a second parity strip.
 29. The system of claim 28, further comprising circuitry for receiving a second request to access the first parity strip; circuitry for receiving a request to access one or more LBAs associated with the stripe including the first parity strip; circuitry for queuing the request to access the one or more LBAs associated with the stripe including the first parity strip; circuitry for processing the first request to access the first parity strip and the second parity strip; circuitry for processing the second request to access the first parity strip; and circuitry for processing the request to access the one or more LBAs associated with the stripe including the first parity strip.
 30. The system of claim 21, further comprising: circuitry for detecting a media error in processing the first request to access the first set of one or more LBAs of a stripe of the RAID; circuitry for receiving a request to access one or more LBAs of the stripe of the RAID volume that are not in the first set of one or more LBAs; circuitry for queuing one or more requests to access one or more LBAs of the stripe of the RAID volume; circuitry for recovering one or more LBAs of the first set of one or more LBAs; and circuitry for processing the request to access one or more LBAs of the stripe of the RAID volume that are not in the first set of one or more LBAs. 